Machine Translation between Hebrew and Arabic: Needs, Challenges and Preliminary Solutions
نویسندگان
چکیده
Modern Hebrew and Modern Standard Arabic, both Semitic languages, share many orthographic, lexical, morphological, syntactic and semantic similarities, but they are still not mutually comprehensible. Most native Hebrew speakers in Israel do not speak Arabic, and the vast majority of Arabs (outside Israel) do not speak Hebrew. Machine translation (MT) between these two language has the potential to bridge over political and cultural differences and bring the disputing peoples in the Middle East somewhat closer together by better understanding each other’s societies. The dominant paradigm in contemporary MT (Brown et al., 1990) relies on large-scale parallel corpora from which correspondences between the two languages can be extracted. However, such abundant parallel corpora currently exist only for few language pairs; and lowand medium-density languages (Varga et al., 2005) require alternative approaches. Specifically, no parallel corpora exist for Hebrew–Arabic.1 As an alternative to the pure statistical approach, we are currently developing a Hebrew-to-Arabic MT system, using the Stat-XFER framework (Lavie, 2008), which is particularly suited for low-resource language pairs. We discuss some linguistic properties of the two languages. We describes the implications on MT of the similarities and, in particular, differences between the two languages. We then discuss possible solutions to these challenges, advocating a linguistically-aware, transfer-based approach. Finally, we describe the system we are in the process of developing and reports some preliminary results.
منابع مشابه
Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results
Machine translation between Arabic and Hebrew has so far been limited by a lack of parallel corpora, despite the political and cultural importance of this language pair. Previous work relied on manually-crafted grammars or pivoting via English, both of which are unsatisfactory for building a scalable and accurate MT system. In this work, we compare standard phrase-based and neural systems on Ar...
متن کاملTransfer-based Machine Translation between morphologically-rich and resource-poor languages: The case of Hebrew and Arabic
متن کامل
A Comparative Analysis of Collocation in Arabic-English Translations of the Glorious Quran
The Qur’an is the only holy book of Muslims all around the world. Each person with any religion and language is interested in comprehending and accepting the rules and regulations of their own belief. Translation of the Qur’an is only an attempt to present its meaning. One of the most challenges in translation of the Qur’an is collocation. A collocation is a sequence of words or terms that co-o...
متن کاملExplorer Edinburgh SLT and MT System Description for the IWSLT 2014
This paper describes the University of Edinburgh’s spoken language translation (SLT) and machine translation (MT) systems for the IWSLT 2014 evaluation campaign. In the SLT track, we participated in the German↔English and English→French tasks. In the MT track, we participated in the German↔English, English→French, Arabic↔English, Farsi→English, Hebrew→English, Spanish↔English, and Portuguese-Br...
متن کاملEdinburgh SLT and MT System Description for the IWSLT 2014 Evaluation
This paper describes the University of Edinburgh’s spoken language translation (SLT) and machine translation (MT) systems for the IWSLT 2014 evaluation campaign. In the SLT track, we participated in the German↔English and English→French tasks. In the MT track, we participated in the German↔English, English→French, Arabic↔English, Farsi→English, Hebrew→English, Spanish↔English, and Portuguese-Br...
متن کامل